Forward Semi-supervised Feature Selection

نویسندگان

  • Jiangtao Ren
  • Zhengyuan Qiu
  • Wei Fan
  • Hong Cheng
  • Philip S. Yu
چکیده

Traditionally, feature selection methods work directly on labeled examples. However, the availability of labeled examples cannot be taken for granted for many real world applications, such as medical diagnosis, forensic science, fraud detection, etc, where labeled examples are hard to find. This practical problem calls the need for “semi-supervised feature selection” to choose the optimal set of features given both labeled and unlabeled examples that return the most accurate classifier for a learning algorithm. In this paper, we introduce a “wrapper-type” forward semi-supervised feature selection framework. In essence, it uses unlabeled examples to extend the initial labeled training set. Extensive experiments on publicly available datasets shows that our proposed framework, generally, outperforms both traditional supervised and stateof-the-art “filter-type” semi-supervised feature selection algorithms [5] by 1% to 10% in accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster homogeneity as a semi-supervised principle for feature selection using mutual information

In this work the principle of homogeneity between labels and data clusters is exploited in order to develop a semi-supervised Feature Selection method. This principle permits the use of cluster information to improve the estimation of feature relevance in order to increase selection performance. Mutual Information is used in a Forward-Backward search process in order to evaluate the relevance o...

متن کامل

Graph Laplacian for Semi-supervised Feature Selection in Regression Problems

Feature selection is fundamental in many data mining or machine learning applications. Most of the algorithms proposed for this task make the assumption that the data are either supervised or unsupervised, while in practice supervised and unsupervised samples are often simultaneously available. Semi-supervised feature selection is thus needed, and has been studied quite intensively these past f...

متن کامل

A Convex Formulation for Semi-Supervised Multi-Label Feature Selection

Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semi-supervised feature selection algorithms have been proposed to exploit both labeled and unlabeled data. However, they are implemented based on graphs, such that th...

متن کامل

Semi-supervised Feature Selection via Spectral Analysis

Feature selection is an important task in effective data mining. A new challenge to feature selection is the socalled “small labeled-sample problem” in which labeled data is small and unlabeled data is large. The paucity of labeled instances provides insufficient information about the structure of the target concept, and can cause supervised feature selection algorithms to fail. Unsupervised fe...

متن کامل

Semi-Supervised Feature Selection with Constraint Sets

In machine learning classification and recognition are crucial tasks. Any object is recognized with the help of features associated with it. Among many features only some leads to classify object correctly. Feature selection is useful technique to detect such specific features. Feature selection is a process of selecting subset of features to reduce number of features (dimensionality reduction)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008